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Abstract. In this article, we discuss the remarkable connection 
between two very different fields, number theory and nuclear physics. 
We describe the essential aspects of these fields, the quantities 
studied, and how insights in one have been fruitfully applied in the 
other. The exciting branch of modern mathematics - random ma- 
trix theory - provides the connection between the two fields. We 
assume no detailed knowledge of number theory, nuclear physics, 
or random matrix theory; all that is required is some familiarity 
with linear algebra and probability theory, as well as some results 
from complex analysis. Our goal is to provide the inquisitive reader 
with a sound overview of the subjects, placing them in their histor- 
ical context in a way that is not traditionally given in the popular 
and technical surveys. 
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1. Summary 



In the early 1970's a remarkable connection was unexpectedly dis- 
covered between two very different fields, nuclear physics and number 
theory, when it was noticed that random matrix theory accurately mod- 
eled many problems in each. Random matrix theory was first used in 
the early 1900's in the study of the statistics of population charac- 
teristics |Wis] . The field developed rapidly in the 1950's when it was 
found to describe the spacing distributions of adjacent resonances (of 
the same spin and parity) observed in the interaction of low energy 
neutrons with nuclei |Wig5| , and it flourished in the 1970's following 
a chance encounter between Hugh Montgomery and Freeman Dyson 
[Monj (when they saw it also predicted answers to many of the most 
difficult problems in number theory). 

In this review article we describe the subjects and the quantities 
studied, and how insights in one field have been fruitfully applied 
in the other. We assume no familiarity with either subject; for the 
most part, basic linear algebra and probability theory suffice (though 
we need some results from complex analysis on analytic continua- 
tion and contour integration for some of the number theory calcua- 
tions). As there are many mathematical surveys of the subject, as well 
as some popular accounts [Ha[ IRocj of how the connection between 
the fields was noticed, our goal is to explain the broad brushstrokes 
of the theory without getting bogged down in the technical details. 
For those interested in a more mathematical survey, we recommend 



[C0II2I [CotI31 [FSVl IKaSa2l IKeSn3j (see also Section 1.8 of jMih2] l 



Our point is to give the flavor of the subject, and bring these amazing 
connections to the attention of a wide audience. We concentrate on a 
representative sample of results and problems, and urge the interested 
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reader to sample the bibliography. In particular, though we discuss 
many of the current statistics studied, we discuss the computations in 
detail only for the classical problem of the density of normalized eigen- 
values of real symmetric matrices and the 1-level density for the family 
of Dirichlet L-functions. We chose these examples as the key steps 
in the analysis of these problems are similar to many others, but the 
mathematical prerequisites to follow the calculations are significantly 
less. 

In particular, our choice means that there are many important topics 
which will have only the briefest of mention (if any). To do the field jus- 
tice would require a significantly longer article than this. Our hope is 
that by keeping the pre-requisites modest a large audience will be able 
to appreciate the striking similarities between two very different fields, 
and get a sense as to the nature of the computations. There are a few 
places where real and complex analysis is used (Fourier transforms and 
the residue theorem), as well as some abstract algebra or group the- 
ory (mostly group homomorphisms from (Z/mZ)* to complex numbers 
of absolute value 1). We state all needed results, and when possible 
provide brief explanations and proofs. While in the last part of the pa- 
per we concentrate on Dirichlet L-functions, we do mention additional 
families of L-functions (the background material is more substantial 
here, and we give only the briefest mention of the needed facts). 

The paper is organized as follows. In ^we first give some number 
theory preliminaries to set the stage, describing some of the problems 
researchers are interested in, and how they are connected with the ze- 
ros of the Riemannn zeta function. We mention the famous Riemann 
Hypothesis, and how its veracity is related to understanding the prime 
numbers. This provides the motivation for studying the behavior of 
these zeros. The amazing observation, first noticed in the 1970's, is 
that many properties of these zeros can be modeled by random matrix 
theory, which had enjoyed a remarkable success in modeling nuclear 
physics. We briefly describe random matrix theory and discuss why 
it is applicable to so many problems. In ^ we describe some of the 
history of nuclear physics, concentrating on the experimental results 
which laid the groundwork for the introduction of random matrix the- 
ory. We then sketch the proof of one of the most important results 
in the subject, Wigner's semi-circle law, in ^ While other results 
are more closely related to the number theory quantities we wish to 
study, we give this proof as it highlights in a very accessible manner 
the techniques needed to attack a variety of problems. We then return 
to number theory in ^ and discuss some (but by no means all!) of 
the earliest applications of random matrix theory. We concentrate on 
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the 1-level density, highlighting the similarities between this calculation 
and the proof of Wigner's semi-circle law. We give an interpretation of 
our number theory results in the language of nuclear physics, and then 
conclude with a very brief summary of some of the current avenues 
being explored. 

For the reader: The core of the paper is Sections [2] and [3l where 
we describe the number theory problems and the nuclear physics his- 
tory which led to the development of random matrix theory, as well as 
briefly summarizing random matrix theory. Sections H] and are more 
advanced (especially the latter), where we give details of the calcula- 
tions. Many of the more technical comments and some proofs of claims 
are relegated to the footnotes; these may safely be skipped by the reader 
interested in the broad brushstrokes of the theory and subject. For the 
benefit of the reader, we have also included in the footnotes definitions 
and explanations of much of the assumed background material to help 
keep the paper accessible. 

2. Introduction 

2.1. Number Theory Preliminaries. The primed are the building 
blocks of number theory: every integer can be written uniquely as a 
product of prime powers |HW] il One of the most important questions 
we can ask about primes is also one of the most basic: how many 
primes are there at most x7 In other words, how many building blocks 
are there up to a given point? 

Euclid proved over 2000 years ago that there are infinitely many 
primes; so, if we let 7r(x) denote the number of primes at most x, we 
know limjj^oo 7r(a;) = oo. Euclid's proof is still used in courses around 
the worldo Can we do better? How rapidly does ti{x) go to infinity? 



^An integer n > 2 is prime if the only positive integers that divide it are 1 and 
itself; if n > 2 is not prime we say it is composite. The number 1 is neither prime 
nor composite, but instead is called a unit. 

^This property is called unique factorization, and is one of the most important 
properties of prime numbers. If 1 were considered a prime, then unique factorization 
would fail; for example, if 1 were prime then 3^ • 7 and 1^009 • 3^ • 7 would be two 
different factorizations of 1701. 

"'The proof is by contradiction. Assume there are only finitely many primes. 
We label them pi, p2, . ■ ■ , Pn- Consider the number m = piP2 • ■ - Pn + 1- Either 
this is a new prime not on our list, or it is composite. If it is composite, it must 
be divisible by some prime; however, it cannot be divisible by any prime on our 
list, as each of these give remainder 1. Thus m is either prime or divisible by a 
prime not on our list. In either case, our list was incomplete, proving that there 
are infinitely many primes. With a little bit of effort, one can show that Euclid's 
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In particular, what can we say about lim^^oo 7r(x)/x, which represents 
the probabihty that a number at most x is prime? 

The answer is given by the Prime Number Theorem, which states 
the number of primes at most x is Li(x) + o(Li(x)), where Li(x) = 

dt/logt and for x large, Li(x) is approximately x/logxfl While it 
is possible to prove the prime number theorem elementarily |ErdtlSel2] . 
the most informative proofs use complex number^ and complex analy- 
sis, and lead to the fascinating connection between number theory and 
nuclear physics. One of the most fruitful approaches to understanding 
the primes is to understand properties of the Riemann zeta function, 
C(s), which is defined for 3?(s) > 1 by 

oo ^ 

c(^) = E;ii; (2-1) 

n=l 

the series converges for 3?(s) > 1 by the integral testH By unique 
factorization, it turns out that we may also write ({s) as a product 
over primesj3 this is called the Euler product of C{s), and is one of its 
most important properties!^ 

n=l p prime ^ ^ 



argument proves there is a C > such that Tr{x) > Clog log x. The first few primes 
generated in this manner are 2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 
2801. A fascinating question is whether or not every prime is eventually listed (see 

my 

^The notation f{x) = o{g{x)) means limaj^oo f{x)/g{x) = 0. 

^A complex number z G C is of the form z — x -\- iy. We call x the real part and 
y the imaginary part; we frequently denote these by 5R(z) and '<s{z), respectively. 

^Let s = a-\-it. Then |C(s)| < X^J^i The integral test from calculus states 
that this series converges if and only if x~'^dx converges, and this integral 
converges if (T > 1 . 

'^To see this, use the geometric series formula (see Footnote [9]) to expand (1 — 
p~'')~'^ as J2'kLoP~'"^ ^^'-^ note that n~'^ occurs exactly once on each side (and 
clearly every term from expanding the product is of the form n^'* for some n. 

^ We give two quick proofs of the importance of the Euler product by showing how 
it implies there are infinitely many primes. The first is ^ 1/n* ^ oo as s ^ 1+, 
which means there must be infinitely many primes as otherwise the product is 
finite. The second proof is to note ^ 1/n^ = 7r^/6 is irrational; if there were only 
finitely many primes than the product would be rational. See for example [MT-Bj 
for details. 



6 



FRANK W. K. FIRK AND STEVEN J. MILLER 



Initially defined only for 3?(s) > 1, using complex analysis the Riemann 
zeta function can be meromorphically continuec^ to all of C, having 
only a simple pole with residue 1 at s = 1. It satisfies the functional 
equatior0 ^ IMT-Bj 

as) = ls{s-l)TQ7r-H{s) = ai-s). (2.3) 

The distribution of the primes is a difficult problem; however, the dis- 
tribution of the positive integers is not, and has been completely known 
for quite some time! The hope is that we can understand as 
this involves sums over the integers, and somehow pass this knowledge 
on to the primes through the Euler product (see Footnote [S] for two 
examples) . 

Riemann [Rl] (see [Ed] for an English translation) observed a fasci- 
nating connection between the zeros of ({s) and the error term in the 
prime number theorem. As this relation is the starting point for our 
story, we describe the details in some length in the next paragraph. 
This part is a bit more technical and relies on complex analysis. The 
reader may safely skip most of the next paragraph; the key piece for the 
rest of the paper is (12.81) . where we show how the primes are connected 
to the zeros of({s) (the function A{n) which appears is defined in (12. 4p . 

One of the most natural things to do to a complex function is to take 
contour integrals of its logarithmic derivative |Lat ISS] ; this will yield 
information about zeros and poles (we'll see later that we can get even 
more information if we weight the integral with a test function). There 
are two expressions for ({s); however, for the logarithmic derivative it 
is clear that we should use the Euler product over the sum expansion. 



The subject of meromorphic continuation belongs to complex analysis. For the 
benefit of the reader who hasn't seen this, we give a brief example that will be of use 
throughout this paper, namely the geometric series formula l + r + r^+r^ + -- - = 
1/(1 — r). Note that while the sum makes sense only when \r\ < 1, 1/(1 — r) is well- 
defined for all r 7^ 1 and agrees with the sum whenever \r\ < 1. We say 1/(1 — r) 
is a meromorphic continuation of the sum. 

"'^'^One proof is to use the Gamma function, T(s) = e~'*t^~^dt. A simple 
change of variables gives J^^ x^''~^e~^^'^^dx — F (|) /n^n^^^. Summing over n 
represents a multiple of ({s) as an integral. After some algebra we find F (|) (^{s) = 

x^^~^Lj{x)dx + x^i^^^uj (i) dx, with uj{x) — J2n=i e""^'^'^- Using Poisson 
summation, we find ut (i) = — ^ H — ^x^ +x^uj{x), which yields Tr~^T (|) C(s) = 
^^^^-^^ + /j°°(a;5''~i + x^^''~i)ui{x)dx, from which the claimed functional equation 
follows. 
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as the logarithm of a product is the sum of the logarithms. Let 

^ I logp ii n = p'^ for some integer r ^-^ 
I otherwise. 



We find 



Cis) ^ \-v 



y.u^P^ ^ _y-^ .2.5) 



n=l 

(this is proved by using the geometric series formula to write (l—p^^)^^ 
as Y^'^=o l/p"*) collecting terms and then using the definition of A{nj). 
Moving the negative sign over and multiplying by x^/ s, we find 

— / -W-ds = — [ TAin)(^Y'A (2.6) 



where we are integrating over some line 3fJ(s) = c > 1. The integral 
on the right hand side is 1 if n < x and if n > x (by choosing x 
non- integral, we don't need to worry about x = n), and thus gives 
Sn<3;^("')- shifting contours and keeping track of the poles and 
zeros of C{s), the residue theoremlHI [Lai [SS] implies that the left hand 
side is 

^- E (2.7) 

P 

the X term comes from the pole of (^(s) at s = 1 (remember we count 
poles with a minus sign), while the x'^/p term arises from zeros; in both 
cases we must multiply by the residue, which is x^/ p (it can be shown 
that C{s) has neither a zero nor a pole at s = 0)o The Riemann 
zeta function vanishes whenever p is a negative even integer; we call 



^""^Let / be a meromorphic function with only finitely many poles on an open 
set U which is bounded by a 'nice' curve 7. Thus at each point zq G U we have 
f{^) ~ J2^=N ^niz — zo)" with TV > —00. If TV > we say / has a zero of order 
TV. If TV < we say / has a pole of order —TV, and in this case we call a_i the 
residue of / at zq (for clarity, we often denote this by Res(/, zq)). If / does not 
have a pole at zg, then the residue is zero. Our assumption implies that there are 
only finitely many points where the residue is non-zero. The residue theorem states 
2^1 /-y f{z)dz = J2zeu^^^(f' useful variant is to apply this to f'{z)/f{z), 

which then counts the number of zeros of / minus the number of poles; another 
is to look at f'{z)/f{z) ■ g{z) where g{z) is analytic, which we will do later when 
stating the explicit formula for the zeros of the Riemann zeta function. 

^^Some care is required with this sum, as ^ I/IpI diverges. The solution involves 
pairing the contribution from p with p; see for example [Daj . 
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these the trivial zeros. These terms contribute YlT=-i^ '^^/{2k) = 
— |log(l — x^^). This leads to the following beautiful formula: 

X- --l^og{l-x-') = J2Mn) (2.8) 

p:S}(p)e(0,i) ^ n<x 
C{p)=o 

If we write n as p'', the contribution from all pieces with r > 2 is 
bounded by 2a;^/^ log x for x largej3 thus we really have a formula for 
the sum of the primes at most x, with the prime p weighted by log p. 
Through partial summation, knowing the weighted sum is equivalent 
to knowing the unweighted sumj^ 

We can now see the connection between the zeros of the Riemann 
zeta function and counting primes at most x. The contribution from 
the trivial zeros is well-understood, and is just — | log(l — The re- 
maining zeros, whose real parts are in [0, 1], are called the non-trivial 
or critical zeros. They are far more important and more mysterious. 
The smaller the real part of these zeros of Ci^)-, the smaller the error. 
Due to the functional equation, however, if C,{p) = for a critical zero 
p then C{1 — p) = as wellEl Thus the 'smallest' the real part can be is 
1/2. This is the celebrated Riemann Hypothesis^ It has a plethora 
of applications throughout number theory and mathematics; counting 
primes is but one of many0 It is clear, however, that the distribution 

^"'To see this, note X]p2<^ logp < x^^"^ logo;, while the contribution from n — 
with r > 3 is bounded by J2r>3 J2p^<x l^gP < x^/'^21og^ x (this is because < x 
impHes r < log2 x < 2 log x) . 

^^Partial summation is the discrete analogue of integration by parts [MT-B| . In 
our case, J2p<x ^^SP ^ x is equivalent to J2p<x ^ ^ ^/ logs. 

^^Note this is only true for zeros in the critical strip, namely < < 1; for 
zeros outside the critical strip we can and do have zeros of C(s) not corresponding 
to zeros of ({1 — s) because of poles of the Gamma function. 

^^The Riemann Hypothesis is probably the most important mathematical aside 
ever in a paper. Riemann |Ed[ IRij wrote (translated into English; note when he 
talks about the roots being real, he's writing the roots as 1/2 + 17, and thus 7 G K is 
the Riemann Hypothesis): ...and it is very probable that all roots are real. Certainly 
one would wish for a stricter proof here; I have meanwhile temporarily put aside the 
search for this after some fleeting futile attempts, as it appears unnecessary for the 
next objective of my investigation. Though not mentioned in the paper, Riemann 
had developed a terrific formula for computing the zeros of C('S), and had checked 
(but never reported!) that the first few were on the critical line 3?(s) = 1/2. His 
numerical computations were only discovered decades later when Siegel was looking 
through Riemann's papers. 

^^The prime number theorem is in fact equivalent to the statement that 5i(p) < 1 
for any zero of C(s)- The prime number theorem was first proved independently by 
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of the zeros of the Riemann zeta function will be of primary (in both 
senses of the word!) importance. 

If we assume the Riemann Hypothesis, all the zeros in the criti- 
cal strip (0 < 3?(p) < 1) lie on the critical line 3?(s) = 1/2, and 
it makes sense to talk about the distribution between adjacent ze- 
ros. The purpose of this note is to discuss one of the most powerful 
models used to predict the behavior of these zeros, namely random 
matrix theory. While other methods have since been developed, ran- 
dom matrix theory (which we describe in the next subsection) was 
the first to make truly accurate, testable predictions. The general 
idea is that the behavior of zeros of the Riemann zeta function are 
well-modeled by the behavior of eigenvalues of certain matrices. This 
idea had previously been successfully used to model the distribution of 
energy levels of heavy nuclei (some of the fundamental papers and 
books on the subject, ranging from experiments to theory, include 
[BFFMPWl [Dm |Dg| |DgllFlMlFmiFKPTllGain[^ IHIl 
SI [WIgTt [Wig2| [Wig3| [Wipt [Wi^ We 



iMihTl Meh2\ MG 



describe the development of random matrix theory in nuclear physics 
in detail in the next section, and then delve into more of the details of 
the connection between the two subjects in §l]and §3 



2.2. Random Matrix Theory Preliminaries. Before describing what 
we mean by random matrix theory and random matrix ensembles (i.e., 
sets of matrices), we quickly review the needed analysis and probability 
material, and then in the next subsection discuss why random matrix 
theory is so applicable at modeling a variety of problems. 

Let p{x) be a continuous or discrete probability distribution. For 
notational convenience we assume p is continuous and use integral 
notation below, though similar statements hold in the discrete case. 
This means p{x) > 0, J^^p{x)dx = 1, and if X is a random variable 
with density p, then the probability X takes on a value in [a, b] is just 
Jl^p{x)dx. 



Hadamard |Had| and de la Vallee Poussin [dlVP] in 1896. Each proof crucially used 
results from complex analysis, which is hardly surprising given that Riemann had 
shown Tr{x) is related to the zeros of the meromorphic function Cis). It was not until 
almost 50 years later that Erdos |Erd| and Selberg [Sel2] obtained elementary proofs 
of the prime number theorem (in other words, proofs that did not use complex 
analysis, which was quite surprising as the prime number theorem was known to 
be equivalent to a statement about zeros of a meromorphic function). See |Gol2| 
for some commentary on the history of elementary proofs. 
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Let X be a random variable with density p. We define the k^^ mo- 
ment of p, denoted fik or E[X'^], by 

/oo 
x^p{x)dx. (2.9) 
'OO 

The zeroth moment is always 1, and the first moment is called the mean. 
The second moment is related to the variance. Recall the variance 
is defined by 

/oo 
(x - iifp{x)dx, (2.10) 
-oo 

and equals the second moment if the mean is zero. For convergence 
issues, we typically are interested in random variables with zero mean, 
variance 1 and finite higher momentsE^ While at first it might seem 
restrictive to assume we have mean and variance 1, this is actually 
equivalent to the first and second moments are finiteo The moments 
are extremely important for understanding a density. While it is not 
the case that the moments uniquely determine a probability distri- 
butioE0, they do for sufficiently nice distributions. The situation is 
similar to the theory of Taylor series. It is sadly not the case that ev- 
ery 'nice' function agrees with its Taylor series in an arbitrarily small 
neighborhood about the point of expansion, even if by 'nice' we mean 
infinitely differentiable0 See |ShTa[ ISi] for more details. 



^^By this we mean \x\^p{x)dx < oo. If the integrand is not absolutely 
convergent, then the value could depend on how we tend to infinity. The standard 
example is the Cauchy distribution, p(a;) = (7r(l+a;^))^^. Note \x\p{x)dx = oo, 

lim^^oo J^j^ xp{x)dx = and Muia^oo jl^ xp{x)dx ~ tt^^ log 2. 

'^^The reason is we can always adjust our probability distribution, in this case, 
to have mean and variance 1 by simple translations and rescaling. For example, 
if the density p has mean /i and variance a, then g{x) = a~^p{ax + /i) has mean 
and variance 1. Thus the third moment (or the fourth if the third vanishes) are 
the first moments that truly show the 'shape' of the density. 

^°For X e [0,oo), consider fi{x) = (27rx)~^/^ exp (-(loga;)^/2) and f2{x) = 
fi{x) [1 + sin(27rloga;)]. For r g N, the r'^ moment of /i and /2 is exp(r^/2). The 
reason for the non-uniqueness of moments is that the moment generating function 
Mf(t) = J_ exp{tx)f{x)dx does not converge in a neighborhood of the origin. See 
Eini], Chapter 2. 

^"'^The standard example is the function f{x) = exp(— if \x\ > and 
otherwise. By using the definition of the derivative and L'Hopital's rule, we see 
/(")(0) = for all n, but clearly this function is non-zero if > 0. Thus the radius 
of convergence is zero! This example illustrates how much harder real analysis can 
be than complex analysis. There if a function of a complex variable has even one 
derivative then it has infinitely many, and is given by its Taylor series in some 
neighborhood of the point. 



NUCLEI, PRIMES AND THE RANDOM MATRIX CONNECTION 



11 



We can now describe random matrix theory and the ensembles we'll 
study. Consider a real symmetric matrix A, so 



A 



I Oil a\2 a\z 

0-12 0-22 0,23 



\ a2N O-^N ■ 

We fix a density p, and define 
Prob(A) = 

This means 

Prob {A : aij G [aij,(5ij\) = 



OlN \ 

02N 

a-NN I 



n 

l<i<j<N 



A' 



n 



pi^'^ij^dXij . 



The goal is to understand properties of the eigenvalues of A. We 
accomplish this by studying a related measure where we place point 
masses at the normalized eigenvalues. We use the Dirac delta fun- 
tionals S{x — Xq), which is a unit point mass at Xq. This means 
/ fix)6{x - Xo)dx = f{xo)& 

To each real symmetric matrix A, we attach a probability measur^ 



IJ'A,n{x) 



1 ^ 

N ^ 



i=l 



X 



2VA^ 



(2.11) 



in §4.11 we'll see why we are normalizing the eigenvalues as we have 
done here. This measure counts the number of normalized eigenvalues 
in an interval: 



fJ'A,N{x)dx 



N 



(2.12) 



We can consider the probability densities AN{x)dx, where A is a probabihty 
density and An{x) = N ■ A{Nx). As iV ^ oo almost all the probability (mass) 
is concentrated in a narrower and narrower band about the origin; we let d{x) be 
the limit with all the mass at one point. It is a discrete (as opposed to continuous) 
probability measure, with infinite density but finite mass. Note that 6{x — Xq) acts 
like a unit point mass; however, instead of having its mass concentrated at the 
origin, it is now concentrated at xq- 

^"'As a is real symmetric, the eigenvalues are real (see Footnote 19) and thus 
this measure is well defined. 
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Using the definition of tlie Dirac delta functional, tlie k^^ moment, 
wliich we denote MA,N{k), is readily computed: 

MANik) = ^^^y^^' . (2.13) 

While this is a nice, explicit formula for the fc*^ moment, it seems 
useless as we do not know the location of the eigenvalues of A; we will 
see in §l]that this is not the case at all. 

There are many other ensembles of matrices worth studying. In addi- 
tion to real symmetric matrices, complex Hermitian and symplectic are 
frequently studied^ In this paper we concentrate on real symmetric 
matrices. 

Random matrix theory models the behavior of a system by an ap- 
propriate set of matrices. Specifically, we calculate some quantity (say 
the probability two normalized eigenvalues are less than half the aver- 
age spacing apart) for each matrix and then average over all matrices 
in our family. The hope, which is born out in many cases (ranging 
from number theory to nuclear physics to bus routes in Cuernevaca, 
MexiccEl), is that these system averages are close to the behavior of the 
system of interest. We describe this correspondence in greater detail 
below. 

2.3. Why Random Matrix Theory. Why do random matrix models 
have a chance of giving useful answers to questions in nuclear physics 



^■^These ensembles have behavior that is often described by a parameter /3, which 
is 1 for real symmetric, 2 for complex Hermitian and 4 for symplectic matrices. 

25See [BBDS, KrSej. We quote from [BBDS; who quote from [KrSe) : There is 
no covering company responsible for organizing the city transport. Consequently, 
constraints such as a time table that represents external influence on the transport 
do not exist. Moreover, each bus is the property of the driver. The drivers try to 
maximize their income and hence the number of passengers they transport. This 
leads to competition among the drivers and to their mutual interaction. It is known 
that without additive interaction the probability distribution of the distances between 
subsequent buses is close to the Poisonian distribution and can be described by the 
standard bus route model.... A Poisson-like distribution implies, however, that the 
probability of close encounters of two buses is high (bus clustering) which is in con- 
flict with the effort of the drivers to maximize the number of transported passengers 
and accordingly to maximize the distance to the preceding bus. In order to avoid the 
unpleasant clustering effect the bus drivers in Cuernevaca engage people who record 
the arrival times of buses at significant places. Arriving at a checkpoint, the driver 
receives the information of when the previous bus passed that place. Knowing the 
time interval to the preceding bus the driver tries to optimize the distance to it by 
either slowing down or speeding up. The papers go on to show the behavior is well- 
modeled by random matrix theory (specifically, ensembles of complex Hermitian 
matrices)! 
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Figure 1 . Molecules in a box 

and other subjects? We consider one of the central problems of classical 
mechanics, namely the orbits in a solar system. It is possible to write 
down a closed form solution in the special case when there are just 
two point masses interacting through gravityQ The three body prob- 
lem, however, defies closed form solutions^ From physical grounds we 
know of course that there is a solution; however, for our solar system we 
cannot analyze the solution well enough to determine whether or not 
billions of years from now Pluto will escape from the sun's infiuence@ 
As difficult as the above problem is, the situation is significantly 
worse when we try to understand the behavior of heavy nuclei. Ura- 
nium, for instance, has over 200 protons and neutrons in its nucleus, 
each subject to and contributing to complex forces. If we completely 
understood the theory of the nucleus, we could predict the energy lev- 
els; sadly, we are far from a complete understanding! As we'll see in 
the next section, physicists were able to gain some insights into the 
nuclear structure by shooting neutrons into the nucleus and analyzing 
the results; however, a complete understanding of the nucleus was, and 
still is, lacking. 

How should we attack such a problem? It's useful to recall other 
complex problems from physics and how they were successfully mod- 
eled. We consider a standard problem in statistical mechanics, namely 
calculating the pressure on a wall. Consider the box in Figure [TJ For 
simplicity we assume that every molecule is moving either left or right, 

^^Explicitly, given two points with masses mi and m2 and initial velocities vi 
and V2 and located at fi and r2, we can describe how the system evolves in time 
given that gravity is the only force in play. 

^^While there are known solutions for special arrangements of special masses, 
three bodies in general position is still open; see [Whj for more details. 

^^Whether or not Pluto will regain planetary status is an entirely different 
question. 
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and all are traveling at the same speed. If we want to calculate the 
pressure on the left wall, we need to know how many particles strike 
the wall in an infinitesimal time. Thus we need to know how many par- 
ticles are close to the left wall and moving towards it. In a room there 
would be at least a mole (about 6.022 ■ 10^^) of air molecules, which 
means that this computation is well beyond our abilities. Without go- 
ing into all of the physics (see for example [Re] ) . we can get a rough 
idea of what is happening. The complexity, the enormous number of 
configurations of positions of the molecules, actually helps us. For each 
configuration we can calculate the pressure due to that configuration. 
We then average over all configurations, and it turns out that a generic 
configuration is close to the system average. This theory has enjoyed 
great success, and suggests a way to model nuclear physics. 

Returning to our problem about heavy nuclei, from quantum me- 
chanics we have the following equation governing our problem: 

if^„ = (2.14) 

where H is the Hamiltonian, whose entries depend on system. En are 
the energy levels and \l/„ are the energy eigenfunctions. Thus we have 
'reduced' nuclear physics to linear algebra. Unfortunately, there are 
two difficulties with this approach. The first is that H is an infinite 
dimensional matrix, and the second is that we do not know any of 
the entries! This makes for quite a daunting task! Wigner's great in- 
sight was that this enormous complexity is similar to what we saw in 
Statistical Mechanics, and actually helps us. The interactions are so 
complex we might as well regard each entry as some randomly chosen 
number. Thus instead of considering the true H for the system, we con- 
sider N X N real symmetric matrices with entries independently chosen 
from nice probability distributions. We compute whatever statistics we 
are interested in for these matrices, average over all matrices, and then 
take the N oo scaling limit. The main result is that the behav- 
ior of the eigenvalues of an arbitrary matrix is often well 
approximated by the behavior obtained by averaging over all 
matrices, and this is a good model for many systems, rang- 
ing from the energy levels of heavy nuclei to the zeros of the 
Riemann zeta function^ 



This is reminiscent of the Central Limit Theorem. For example, if we average 
over all sequences of tossing a fair coin 2N times, we obtain N heads, and most 
sequences of 2N tosses will have approximately N heads, where approximately 
means deviations on the order of \fN . 



nuclei, primes and the random matrix connection 15 

3. Nuclear Physics History 

Below we discuss some of the history of investigations of the nucleus, 
concentrating on the parts that led to the introduction of random ma- 
trix theory to the subject. We mention some of the connections with 
number theory, which will be explored in much greater detail later. 



3.1. Introduction. The Riemann Hypothesis asserts that the non- 
trivial zeros of the Riemann zeta function are of the form p = 1/2 + ijp 
with 7p real. About the year 1913, Polya conjecturecl^ that the 7p are 
the eigenvalues of a naturally occurring, unbounded, self-adjoint oper- 
ator, and are therefore realo Later, Hilbert contributed to the conjec- 
ture, and reportedly introduced the phrase 'spectrum' to describe the 
eigenvalues of an equivalent Hermitian operator, apparently by analogy 
with the optical spectra observed in atoms. This remarkable analogy 
pre-dated Heisenberg's Matrix Mechanics and the Hamiltonian formu- 
lation of Quantum Mechanics by more than a decade. Not surprisingly, 
the Polya-Hilbert conjecture was considered so intractable that it was 
not pursued for decades, and Random Matrix Theory remained in a 
dormant state. To quote Diaconis [Dil] : ^^Historically, Random Matrix 
Theory was started by Statisticians |Wis] studying the correlations be- 
tween different features of population (height, weight, income...). This 
led to correlation matrices with {i,j) entry the correlation between the 
ith and jth features. If the data were based on a random sample from 
a larger population, these correlation matrices are random; the study 
of how the eigenvalues of such samples fluctuate was one of the first 
great accomplishments of Random Matrix Theory." Diaconis |Di2] has 
given an extensive review of Random Matrix Theory from the perspec- 
tive of a statistician. A strong argument can be made, however, that 
Random Matrix Theory, as we know it today in the Physical Sciences, 



began in a formal mathematical sense with the Wigner surmise Wig5 



concerning the spacing distribution of adjacent resonances (of the same 
spin and parity) in the interactions between low-energy neutrons and 
nuclei, discussed below. 



The first reference to this conjecture in the literature might not have been until 
1973 by Montgomery |Mon| . 

■^^If V is an eigenvector with eigenvalue A of a Hermitian matrix A (so A — A* 
with A* the complex conjugate transpose of A, then v*{Av) = v*{A*v) = {Av)*v; 
the first expression is A||w|p while the last is A||ti|p, with = v*v = J2 I'^'iP non- 
zero. Thus A = A, and the eigenvalues are real. This is one of the most important 
properties of Hermitian matrices, as it allows us to order the eigenvalues. 
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3.2. Nuclear Physics and Random Matrix Theory. The period 
from the mid-1930's to the late 1970's was the Golden Age of Neu- 
tron Physics; widespread interest in understanding the physics of the 
nucleus, coupled with the need for accurate data in the design of nu- 
clear reactors, made the field of Neutron Physics of global importance 
in fundamental Physics, Technology, Economics, and Politics. In the 
mid-1950's, a discovery was made that turned out to have far-reaching 
consequences beyond anything that those working in the field could 
have imagined. For the first time, it was possible to study the mi- 
crostructure of the continuum in a strongly-coupled, many-body sys- 
tem, at very high excitation energies. This unique situation came about 
as the result of the following facts: 



• Neutrons, with kinetic energies of a few electron-volts, excite 
states in compound nuclei at energies ranging from about 5 mil- 
lion electron-volts to almost 10 million electron-volts - typical 
neutron binding energies. Schematically, see Figure [2J 



• Low-energy resonant states in heavy nuclei (mass numbers greater 
than about 100) have lifetimes in the range 10~^^ to 10~^^ sec- 
onds, and therefore they have widths of about 1 eV. The com- 
pound nucleus loses all memory of the way in which it is formed. 
It takes a relatively long time for sufficient energy to reside in a 
neutron before being emitted. This is a highly complex, statis- 
tical process. In heavy nuclei, the average spacing of adjacent 
resonances is typically in the range from a few eV to several 
hundred eV. 

• Just above the neutron binding energy, the angular momentum 
barrier restricts the possible range of values of total spin of a 
resonance, J (J = I -f i -|- 1, where I is the spin of the target 
nucleus, i is the neutron spin, and 1 is the relative orbital an- 
gular momentum). This is an important technical point. 

• The neutron time-of-fiight method provides excellent energy 
resolution at energies up to several keV. (See Firk [Fi] for a 
review of time-of-flight spectrometers.) 
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Figure 2. An energy-level diagram showing the loca- 
tion of highly- excited resonances in the compound nu- 
cleus formed by the interaction of a neutron, n, with a 
nucleus of mass number A. Nature provides us with a 
narrow energy region in which the resonances are clearly 
separated, and are observable. 



A 1-eV neutron travels 1 meter in 72.3 microseconds. At non- 
relativistic energies, the energy resolution AE at an energy E is simply: 

AE ^2EAt/tE, (3.1) 

where At is the total timing uncertainty, and ts is the flight time for 
a neutron of energy E. 

In 1958, the two highest- resolution neutron spectrometers in the 
world had total timing uncertainties At 200 nanoseconds. For a 
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flight-path length of 50 meters the resolution was ?a 3 eV at 1 
keV. 

In 238-[j _|_ excitation energy is about 5 MeV; the effective 

resolution for a 1 keV-neutron was therefore 

AE/Eeffectivc ~ 6-10-^. (3.2) 

(at 1 eV, the effective resolution was about 10~^^). 

Two basic broadening effects limit the sensitivity of the method, they 
are: 

(1) Doppler broadening of the resonance profile due to the thermal 
motion of the target nuclei; it is characterized by the quantity 
5 ^ 0.3a/EA4 (eV), where A is the mass number of the target. 
If = 1 keV and A = 200, 5 ^ 0.7 eV, a value that may be 
ten times greater than the natural width of the resonance. 

(2) Resolution broadening of the observed profile due to the finite 
resolving power of the spectrometer. For a review of the exper- 
imental methods used to measure neutron total cross sections 
see Firk and Melkonian [FMj . Lynn |Ly| has given a detailed 
account of the theory of neutron resonance reactions. 

In the early 1950's, the field of low-energy neutron resonance spec- 
troscopy was dominated by research groups working at nuclear reactors. 
They were located at National Laboratories in the United States, the 
United Kingdom, Canada, and the former USSR. The energy spectrum 
of fission neutrons produced in a reactor is moderated in a hydroge- 
nous material to generate an enhanced flux of low-energy neutrons. 
To carry out neutron time-of-fiight spectroscopy, the continuous flux 
from the reactor is "chopped" using a massive steel rotor with fine slits 
through it. At the maximum attainable speed of rotation (about 20,000 
rpm), and with slits a few thousandths-of-an-inch in width, it is possi- 
ble to produce pulses each with a duration approximately 1 fisec. The 
chopped beams have rather low fluxes, and therefore the flight paths 
are limited in length to less than 50 meters. The resolution at IkeV 
is then AE ^ 20 eV, clearly not adequate for the study of resonance 
spacings about 10 eV. 

In 1952, there were only four accelerator-based, low-energy neutron 
spectrometers operating in the world. They were at Columbia Univer- 
sity in New York City, Brookhaven National Laboratory, the Atomic 
Energy Research Establishment, Harwell, England, and at Yale Univer- 
sity. The performances of these early accelerator-based spectrometers 
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were comparable with those achieved at the reactor-based facihties. It 
was clear that the basic limitations of the neutron-chopper spectrom- 
eters had been reached, and therefore future developments in the field 
would require improvements in accelerator-based systems. 

In 1956, a new high-powered injector for the electron gun of the 
Harwell electron linear accelerator was installed to provide electron 
pulses with very short durations (typically less than 200 nanoseconds) 
[FRGj ■ The pulsed neutron flux (generated by the (7, n) reaction) 
was sufficient to permit the use of a 56-meter flight path; an energy 
resolution of 3 eV at 1 keV was achieved. 

At the same time. Professors Havens and Rainwater (pioneers in 
the fleld of neutron time-of-flight spectroscopy) and their colleagues at 
Columbia University were building a new 385-MeV proton synchrocy- 
clotron a few miles north of the campus (at the Nevis Laboratory). The 
accelerator was designed to carry out experiments in meson physics and 
low-energy neutron physics (neutrons generated by the (p, n) reaction). 
By 1958, they had produced a pulsed proton beam with duration of 
25 nanoseconds, and had built a 37-meter flight path jRDRH[ [DRRHj . 
The hydrogenous neutron moderator generated an effective pulse width 
of about 200 nanoseconds for 1 keV- neutrons. In 1960, the length of the 
flight path was increased to 200 meters, thereby setting a new standard 
in neutron time-of-flight spectroscopy |GRPHj . 

3.3. The Wigner Surmise. At a conference on Neutron Physics by 
Time-of-Flight , held in Gatlinburg, Tennessee on November 1st and 
2nd, 1956, Professor Eugene Wigner (Nobel Laureate in Physics, 1963) 
presented his surmise regarding the theoretical form of the spacing dis- 
tribution of adjacent neutron resonances (of the same spin and parity) 
in heavy nuclei. At the time, the prevailing wisdom was that the spac- 
ing distribution had a Poisson form (see, however, |GPj ). The limited 
experimental data then available was not sufficiently precise to fix the 
form of the distribution (see [HuJ). The following quotation, taken 
from Wigner's presentation at the conference, introduces the concept 
of random matrices in Physics, for the flrst time: 

^^Perhaps I am now too courageous when I try to guess the distribu- 
tion of the distances between successive levels. I should re-emphasize 
that levels that have different J -values (total spin) are not connected 
with each other. They are entirely independent. So far, experimental 
data are available only on even-even elements. Theoretically, the sit- 
uation is quite simple if one attacks the problem in a simple-minded 
fashion. The question is simply 'what are the distances of the charac- 
teristic values of a symmetric matrix with random coefficients?' 
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We know that the chance that two such energy levels coincide is in 

ail 0.12 



finitely unlikely. We consider a two-dimensional matrix, 



0-21 0-22 



2 

12- 



in which case the distance between two levels is a/ (an — 022)^ + 4a 
This distance can be zero only if an = a22 O'lT'd ai2 = 0. The difference 
between the two energy levels is the distance of a point from the origin, 
the two coordinates of which are (an — a22) and ai2. The probability 
that this distance is S is, for small values of S, always proportional to 
S itself because the volume element of the plane in polar coordinates 
contains the radius as a factor. 

The probability of finding the next level at a distance S now becomes 
proportional to SdS . Hence the simplest assumption will give the prob- 
ability 

^p^e^p[-yS')sdS (3.3) 

for a spacing between S and S -\- dS . 

If we put X = pS = S / (S) , where (S) is the mean spacing, then the 
probability distribution takes the standard form 

p{x)dx — ^ ^^P (— 7rx^/4) dx, (3.4) 

where the coefficients are obtained by normalizing both the area and the 
mean to unity. " 

This form, in which the probabihty of zero spacing is zero, is strik- 
ingly different from the Poisson form 

p{x)dx = exp{—x)dx (3.5) 

in which the probabihty is a maximum for zero spacing. The form of 
the Wigner surmise had been previously discussed by Wigner himself 



Wigl| , and by Landau and Smorodinsky |LS] . but not in the spirit of 
Random Matrix Theory. 

It is interesting to note that the Wigner distribution is a special 
case of a general statistical distribution, named after Professor E. 
H. Waloddi Weibull (1887-1979), a Swedish engineer and statistician 
|Wei] . For many years, the distribution has been in widespread use in 
statistical analyses in industries such as aerospace, automotive, elec- 
tric power, nuclear power, communications, and life insuranceo The 
distribution gives the lifetimes of objects and is therefore invaluable in 



"'^In fact, one of the authors has used Weibull distributions to model run pro- 
duction in major league baseball, giving a theoretical justification for Bill James' 
Pythagorean Won-Loss formula [Mil3] . 
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studies of the failure rates of objects under stress (including people!). 
The WeibuU probability density function is 

Wei(x;fc,A) = ^ (^)''' exp (-(x/A)'=) (3.6) 

where x > 0, > is the shape parameter, and A > is the scale pa- 
rameter. We see that Wei(a;; 2, 2/ ^Jn) = p{x), the Wigner distribution. 
Other important Weibull distributions are given in the following list 

• Wei(a;; 1, 1) = exp(— x) the Poisson distribution; 

• Wei(a;; 2, A) = Ray(A), the Rayleigh distribution; 

• Wei(a;; 3, A) is approximately a normal distribution!^ 

For Wei(x; k, A), the mean is AF (1 + (1/A;)), the median is A log(2)^/'^, 
and the mode is X{k — 1)^/^ /k^/^, if A; > 1. As oo, the Weibull 
distribution has a sharp peak at A0 

At the time of the Gatlinburg conference, no more than 20 s-wave 
neutron resonances had been clearly resolved in a single compound nu- 
cleus and therefore it was not possible to make a definitive test of the 
Wigner surmise. Immediately following the conference, J. A. Harvey 
and D. J. Hughes [HHj . and their collaborators, working at the fast- 
neutron-chopper-groups at the high flux reactor at the Brookhaven Na- 
tional Laboratory, and at the Oak Ridge National laboratory, gathered 
their own limited data, and all the data from neutron spectroscopy 
groups around the world, to obtain the first global spacing distribu- 
tion of s-wave neutron resonances. Their combined results, published 
in 1958, showed a distinct lack of very closely spaced resonances, in 
agreement with the Wigner surmise. 

By late 1959, the experimental situation had improved, greatly. At 
Columbia University, two students of Professors Havens and Rainwater 
completed their PhD theses; one, Joel Rosen [RDRHj . studied the first 
55 resonances in ^'^^U+n up to 1 keV, and the other, J Scott Desjardins 
[DRRHj . studied resonances in two silver isotopes (of different spin) in 
the same energy region. These were the first results from the new 
high-resolution neutron facility at the Nevis cyclotron. 

At Harwell, Firk, Lynn, and Moxon |FLMj completed their study 
of the first 100 resonances in ^as-jj _|_ ^ at energies up to 1.8 keV; 

■^■^Obviously this Weibull cannot be a normal distribution, as they have very 
different decay rates for large a:, and this Weibull is a one-sided distribution! What 
we mean is that for < a; < 2 this Weibull is well approximated by a normal 
distribution which shares its mean and variance, which are (respectively) r(4/3) « 
.893 and r(5/3) - r(4/3)2 « .105. 

'^^Historically, Frechet introduced this distribution in 1927, and Nuclear Physi- 
cists often refer to the Weibull distribution as the Brody distribution [BFFMP W] . 
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Figure 3. High resolution studies of the total neutron 
cross section of ^^^U, in the energy range 400eV - ISOOeV 
(12). The vertical scale (in units of "barns" ) is a measure 
of the effective of the target nucleus. 



their measurement of the total neutron cross section for the interaction 
238-[j _|_ n in the energy range 400-1800 eV is shown in Figure [31 

When this experiment began in 1956, no resonances had been re- 
solved at energies above 500 eV. The distribution of adjacent spacings 
of the first 100 resonances in the single compound nucleus, ^^^U + n. 
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Figure 4. A Wigner distribution fitted to the spacing 
distribution of 932 s-wave resonances in tfie interaction 
238JJ _|_ ]2 at energies up to 20 keV. 

ruled out an exponential distribution and provided the best evidence 
(then available) in support of Wigner's proposed distribution. 

Over the last half-century, numerous studies have not changed the 
basic findings. At the present time, almost 1000 s-wave neutron reso- 
nances in the compound nucleus ^^^U have been observed in the energy 
range up to 20 keV. The latest results, with their greatly improved sta- 
tistics, are shown in Figure H] |DLLj . 

3.4. Further Developments. The first numerical investigation of the 
distribution of successive eigenvalues associated with random matrices 
was carried out by Porter and Rozenzweig in the late 1950's [PRJ. 
They diagonalized a large number of matrices where the elements are 
generated randomly but constrained by a probability distribution. The 
analytical theory developed in parallel with their work: Mehta |Mehlj . 
Mehta and Gaudin [MG] , and Gaudin [Gauj . At the time it was clear 
that the spacing distribution was not influenced significantly by the 
chosen form of the probability distribution. Remarkably, the n x n 
distributions had forms given almost exactly by the original Wigner 
2x2 distribution. 
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The linear dependence of p{x) on the normahzed spacing x (for small 
x) is a direct consequence of the symmetries imposed on the (Hamil- 
tonian) matrix, H{hij). Dyson [Dyl] discussed the general mathemati- 
cal properties associated with random matrices and made fundamental 
contributions to the theory by showing that different results are ob- 
tained when different symmetries are assumed for H. He introduced 
three basic distributions; in Physics, only two are important, they are: 

• the Gaussian Othogonal Ensemble (GOE) for systems in which 
rotational symmetry and time-reversal invariance holds (the 
Wigner distribution): p{x) = (7r/2)xexp (— (7r/4)x^); 

• the Gaussian Unitary Ensemble (GUE) for systems in which 
time-reversal invariance does not hold (French et al. |FKPTj ): 

p{x) = (32/7r2)x2exp(-(7r/4)x2). 

The mathematical details associated with these distributions are 
given in jMehlj . 

The impact of these developments was not immediate in Nuclear 
Physics. At the time, the main research endeavors were concerned 
with the structure of nuclei-experiments and theories connected with 
Shell-, Collective-, and Unified models, and with the nucleon-nucleon 
interaction. The study of Quantum Statistical Mechanics was far re- 
moved from the main stream. Almost two decades went by before 
Random Matrix Theory was introduced in other fields of Physics (see, 
for example, Bohigas, Giannoni and Schmit [BGSj and Alhassid |A1] ) . 

3.5. From Physics to Number Theory. Interestingly, the next de- 
velopment occurred in an area having nothing to do with Physics. In 
the field of Number Theory, perhaps the greatest unsolved problem 
has to do with the Riemann conjecture (that dates from the mid-19**^ 
century): if ({s) = ^2 '^/n^, then every complex number p in the crit- 
ical strip (0 < 9ft(p) < 1) at which the analytic continuation of ({s) 
has a non-trivial zero has real part equal to 1/2. In 1914, Hardy [Harj 
proved that there are infinitely many zeros of the zeta function on the 
line critical line 3ft(s) = 1/2. Later Selberg |Sellj proved a small posi- 
tive percentage are on this line; this was improved by Levinson |Levj to 
a third, and now thanks to Conrey |Conlj we know at least two-fifths 
lie on the lineF^ 

is an interesting perspective to proving more than a third of the zeros lie 
on the critical line. As zeros off the line occur in complex conjugate pairs, proving 
more than a third of all non-trivial zeros lie on the line is equivalent to more than 
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In the early 1970's, Hugh Montgomery, a mathematician at the Uni- 
versity of Michigan, was investigating the relative spacing of the zeros 
of the zeta function |Mon] (because of applications to the class number 
problenjf^. Let us recall that, if we have a series of points distributed 
randomly along a line, with average density normalized to 1, and we 
treat the coordinates of the points as independent random variables, 
then the probability of finding j points in a given interval of length x 
is the Poisson distribution 



For our real symmetric and complex Hermitian random matrix en- 
sembles, the probability of finding more than one eigenvalue in a short 
interval is less than that given by the Poisson distribution - the eigen- 
values of the random matrix are said to 'repel' each other. The pair 
and higher level correlation function describe this effect (we discuss 
these functions in greater detail later in the paper; knowing all the 
correlation functions is equivalent to knowing the neighbor spacings). 
Montgomery studied the pair correlation function for the zeros of the 
zeta function and he gave evidence that it has the asymptotic form be 



At a chance meeting between Montgomery and Dyson at Princeton 
in the early 1970's, Montgomery showed his pair correlation function 
to Dyson, who recognized it as the pair correlation function of eigen- 
values of random Hermitian matrices in a Gaussian Unitary Ensemble 



a half of all zeros with real part at least 1/2 are on the line! Thus, in this sense, a 
'majority' of all zeros are on the critical line. 

■^^The class number measures how much unique factorization fails in the ring of 
integers of a finite extension of Q, and thus is an extremely important property 
of these fields. For example, = {a + ib : a,b € Z,} has unique factorization, 
while Zli^/E] = {a + ib^/E : a, 6 G Z} does not (in the latter, note we can write 6 as 
either 2 • 3 or (1 + iV5)(l — iVS), and none of these four numbers can be factored 
as (a + ib\/5){c + id\/b) without one of the two factors being a unit (the units are 
numbers in the ring whose norm is 1; in Z[iV5], these numbers are ±1 (in Z[\/5] 
we would also have numbers such as 2 + \/5, as (2 + \/5)(— 2 + Vs) = 1). The 
class number problem is to find all imaginary quadratic fields with a given class 
number; see [Stl IWaj for more details and results. It turns out (see [CI]) that if 
there are many small (relative to the average spacing) gaps between zeros of C,{s) on 
the critical line, then there are terrific lower bounds for the class number; another 
connection between the class number and zeros of L-functions is through the work 
of Goldfeld [CHIT] and Gross-Zagier [GZ]. 




(3.7) 
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Figure 5. Odlyzko's test of the Montgomery conjec- 
ture, involving 70 million Riemann zeros near 10^°. 

(an ensemble without time- reversal invariance). In a masterful numeri- 
cal calculation of the distribution of spacings between zeros of the zeta 
function, Andrew Odlyzko |OdH I0d2j tested the Montgomery conjec- 
ture by studying millions of normalized zeros near the lO^"**^ and the 
]^g22nd ^gj-Q Qf (^(s). His computed correlation function shows remark- 
able agreement with Montgomery's form (see Figure E]). 

As we shall see, this work continues to have a profound impact on 
developments in contemporary Number Theory. 

In the remaining two sections, we explore one statistic from random 
matrix theory (the density of eigenvalues) and one from number theory 
(the 1-level density of low- lying zeros). Though these statistics are not 
exactly analogous, they are similar. The reason we chose to study these 
two are that the general steps of the proofs are similar, and thus this 
provides a nice introduction to how intuitions and methods in one field 
can be transferred to another. 

4. Wigner's Semi-circle law 

4.1. Wigner's Semi-circle Law (Statement). We state and prove 
a version of Wigner's semi-circle law below. We refer the reader to 
[ERSYl lESYt ITVll ITV2] for the most general version and proof of the 
semi-circle law as well as spacings between adjacent eigenvalues. We 
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content ourselves with this special this version is easy to state 

and prove, and the conditions are frequently satisfied in practice. 

Theorem 4.1 (Wigner's semi-circle law). Consider the ensemble of 
N X N real symmetric matrices with entries independent, identically 
distributed random variables from a fixed probability distribution p{x) 
with mean 0, variance 1, and other moments finite. Then for almost 
all A, as N ^ oo 



In other words, the number of normalized eigenvalues in an interval 
[a, b] C [—1, 1] is found by integrating the semi-circle over that interval. 

Note that such a result could never hold for all A, as given any e > 
there is always a small (though rapidly tending to zero!) probability 
that we've chosen a matrix that is within e units from being a diagonal 
(i.e., each non-diagonal entry is at most |e|). 

For example, consider Figures [6] and [71 In the first we've drawn 
the entries from the standard normal. This satisfies the conditions of 
Wigner's semi-circle law, and we see already that with just 400 x 400 
matrices the fit is excellent. 

How essential are the conditions in the theorem? Does the result 
hold even if these conditions are violated, but perhaps the proof is 
just harder (or currently unknown)? To investigate this, we choose 
instead of the standard normal the Cauchy distribution (7r(l + x^))~^. 
This distribution clearly has infinite variance, and thus obviously fails 
to satisfy the conditions. (It also has no mean as the integral of |x| 
is infinite.) We see that the behavior is decidedly non-semi-circular 
(the huge probabilities at the end are the probabilities of observing an 
eigenvalue that far or further). 

We will use the Method of Moments to prove the semi-circle law. We 
briefiy summarize how we can pass from knowledge of the moments to 
knowledge of the eigenvalue distribution. Recall the k^^ moment is 
YliLi Aj(y4)'^/2'^A^2+^. Imagine we had a 1 x 1 matrix and we knew the 
first moment of the eigenvalues (well, here it would just be eigenvalue). 
We have one equation in one unknown: 




— if |x| < 1 
otherwise. 



\i{A)/2 



(4.1) 



this is clearly solvable and we can express Xi{A) in terms of Hi. Imagine 
now we have a 2 x 2 matrix. Then we have two equations in two 
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Distribution of eigenvalues from Gaussian (N = 400 with 500 repetitions) 



Probability 



'jiiiiliiillililiUliUiiiii> 



-1 -0.5 0.5 1 

Figure 6. A histogram plot of the normahzed eigen- 
values for 500 matrices, each 400 x 400. The entries are 
chosen independently from the standard normal p{x) = 
(27r)-V2 exp(-xV2). 

unknowns: 

^(Ai(A) + A2(A)) = /il 

^JXMr + X.iAr) = (4.2) 

For almost all /ii and fi2 this is solvable (actually, we do not have to 
worry about this ever not being solvable, as the Xi{A) are always drawn 
from a matrix, and thus the equations will be consistent). We can 
therefore express the two eigenvalues in terms of the first two moments. 
Similarly, if we looked at the first three moments of a 3 x 3 matrix we 
would have enough information to find the eigenvalues. In the general 
case, we need to know the first moments to find the eigenvalues of 
an X matrix jf^ as we are letting A^ — >■ oo, we need to compute all 
the moments to determine the eigenvalues. 

The idea of the proof is as follows. For each matrix A we calculate 
its moments; let us denote the k^^ moment of A by M4 jv(^)- The 



Of course, one has to invert these relations to find the eigenvalues! 
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Distribution of eigenvalues from Cauchy (N - 400 with 500 repetitions) 

Probability 




-200 -100 100 200 

Figure 7. A histogram plot of the normahzed eigen- 
values for 500 matrices, each 400 x 400. The entries are 
drawn from the Cauchy distribution (7r(l + The 
bin on the extreme right represents all normalized eigen- 
values that larger or large (and similarly for the bin on 
the extreme left). 

expected value of this is 

/oo roo 
■■■ MA,Nik)Fioh{A)dA. (4.3) 
-oo J — oo 

We then show that lim ^ Mj^{k) = C{k), the k^^ moment of the 
semi-circle. This is almost, but not quite, enough to then conclude 
that a central limit theorem type situation occur^, and a generic 
eigenvalue measure is close to the system average (which converges 
to the semi-circle as N —* oo). The reason it is not sufficient is that 
we must also control the varianceQ however, this is easily done by 
similar arguments (see for example |HMt iMMSj ). 



"'^See |GS[ IFel] for proofs of the central limit theorem, or [MT-B] for a sketch of 
the proof. 

'^^For example, imagine for all N we always had half the moments equal and 
the other half equal 2C{k); then the average is C{k) but no measure is close to the 
system average. 
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4.2. Wigner's Semi-circle Law (Sketch of Proof). We sketch the 
proof of Wigner's Semi-circle Law. As we've stated earlier, the reason 
we chose to prove this (as opposed to many of the other results) is that 
this proof is mostly self-contained, and highlights the key features of 
proofs in the subject. 

There are typically three steps in working with random matrix en- 
sembles (or the corresponding number theory quantities). We state 
these steps below, and then elaborate in great detail. 

(1) Determine the correct scale. 

(2) Develop an explicit formula relating what we want to study to 
something we understand. 

(3) Use an averaging formula to analyze the quantities above. 

Note it is not always trivial to figure out what is the correct statistic 
to study, and frequently very advanced combinatorics are needed to 
analyze the quantitieso 

We describe these steps in detail for our random matrix ensembles 
and the semi-circle law. The key input is the following basic result 
from linear algebra, the Eigenvalue Trace Lemma0 

Theorem 4.2 (Eigenvalue Trace Lemma). Let A be an N x N matrix 
with eigenvalues Xi{A). Then 

N 

Trace(/l*^) = J]]^*!^)''- (4-4) 

n=l 

As the trace of a matrix is the sum of its diagonal entries, 

N N 

Trace(yl'^) = ^ ^ ■ ■ • ^ ^ 0'ixi2^i2i-i ' ' ' '^ijvn- (4-5) 

The Eigenvalue Trace Lemma allows us to do the first two steps, 
namely determine the correct scale and relate what we want to study 

^°Many of the papers in the field have large sections devoted to handling combina- 
torics; see for instance (HMl IRubi IRS] . Interestingly, sometimes the combinatorics 
cannot be handled, and in [Gaoj we believe the number theory and random matrix 
theory agree where both have been calculated, but we cannot do the combinatorics 
to prove this. 

"^^The proof is trivial if A is diagonal or upper diagonal, following by definition. 
As we only need this result for real symmetric matrices, which can be diagonalized, 
we only give the proof in this case. Let S be such that A = SAS~^ with A diagonal. 
The claim follows from A'' = SA'^S-^ and Trace(ylBC) = Trace(BCA). 
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(the eigenvalues) to what we know (the matrix elements we randomly 
choose). We'll see later the analogue of this in number theory. 

o For the first step, we take k = 2 and find 

N N N 

Trace(A2) = J] J] a., = Y^^Af = N {\{Af) , (4.6) 
i=i j=i 1=1 

where (A(A)^) denotes the average of the square of the eigenvalues. 
As = aji and these are drawn from a probability distribution with 
mean and variance 1, we have IE[aij] = 1, as this is just the variance. 
Thus the expected value of N times the average eigenvalue square is 
just iV^, so the average eigenvalue square is of size A^, so (heuristically) 
the average eigenvalue is of size a/ZvEI Why do we then normalize the 
eigenvalues by dividing by 2-\/iV instead of \/N7 The reason is to make 
the final formula 'clean' (i.e., this allows us to say the semi-circle law 
instead of the semi-ellipse); arguments such as these will capture the 
dependence on the key parameter (in this case, the iV-dependence), 
but it will not catch constant dependence!^ 

o For the second step, we want to understand the eigenvalues but it 
is the matrix elements we choose. The Eigenvalue Trace Lemma says 
= Trace(A'^); thus (^J^ becomes 

M,.W ^ Sl^i^i! . lE^, (4.7) 

This is a terrific exchange. We have discussed how knowing the mo- 
ments of the eigenvalues suffices to determine the eigenvalues; this al- 
lows us to express these moments in terms of the quantities we are 



■^^With a little more work, we could calculate the variance of the average eigen- 
value squared, using either the Central Limit Theorem or even Chebyshev's Theo- 
rem (from probability). 

^"^This is similar in some sense to dimensional analysis arguments in physics, 
which detect parameter dependence but not the constants. For example, imagine a 
pendulum of mass m (in kilograms), length L (in meters) where the difference in rest 
height (when the pendulum is down) and the raised height (where it is at angle 9) 
is Lq meters. We assume the only force acting is gravity, with constant g (in meters 
per second squared). The period is how long (in seconds) it takes the pendulum 
to do a complete cycle, must be a function of m, L, Lq and g; however, the only 
combinations of these quantities that give units of seconds are \/L/g and y^Lo/g; 
thus the period must be a function of these two expressions. The correct answer 
turns out to be (at least for small initial displacements) approximately 2-KyjLjg\ 
we are able to deduce the correct functional form, though the constants are beyond 
such analysis. 
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choosing. 

o For the third and final step, we note that in order for (14. 7p to be 
useful we must be able to average it over all A in our family; in other 
words, we must compute 



M^{k) = E[MA,Nm 

foo POO 'TV„„„/' Ak 



i-oo i-oo 2''Ni+' 



l<j<j<Af 



The advantage is that Trace(^*^) is a polynomial in the matrix entries, 
and the integrals above can be readily evaluated. 

For example, the integral for the average second moment is 



-j^ poo poo ^ ^ 

■■■/ ^^a%- p{au)daii- ■ ■p{aNN)daNN■ 
J-oo J~oo -^-^ -^^ 



The integration factors as 



/•OO /"OO 

/ a^ijp{aij)daij ■ JJ / p{aki)daki = 1 

Jaii=-oo (J. Jaki=~oo 

k<l 



(the first piece is 1 because it is a variance and thus is 1 by assumption, 
while the others are 1 as this is one of the defining properties of a 
probability distribution) . 



NUCLEI, PRIMES AND THE RANDOM MATRIX CONNECTION 



33 



While the second moment calculation looks simple, the higher mo- 
ment calculations require more involved computations and combina- 
tories, in particular the Catalan numbersEfl The point is these compu- 
tations can be done, and this analysis completes the proof (see |MT-B] 
for the general arguments, and |Leh] for the combinatorics calcula- 
tion) 

4.3. Additional statistics. There are numerous other statistics we 
could investigate; the density of normalized eigenvalues is by no means 
the most natural, but it does highlight the general features. The 
more fundamental statistic is the spacings between adjacent normal- 
ized eigenvalues, and not their density (though the density is used to 
rescale to have mean spacing 1, allowing us to compare apples and 
apples). These spacings can either be attacked directly, or through 
the n-level correlations and combinatorics. It is conjectured that for 
our ensembles of real symmetric matrices, the spacings between nor- 
malized eigenvalues converges to a universal measure independent of 
p. This measure is approximately (7r/2)xexp (— (7r/4)a;^). Until very 
recently this was only known if the matrix elements were chosen from 
normal distributions; however, there has been great progress since the 
original version of this paper was written. L. Erdos, J. A. Ramirez, B. 
Schlein, T. Tao, V. Vu and H.-T. Yau [ERSYl [ESYl [TVTl [TV2] have 
removed this assumption and greatly generalized the class of matrices 
where the conjecture is known; the interested reader should see these 

^"^It is not hard to show the odd moments vanish by simple counting arguments. 
For the even moments, if the a^'s are not matched in pairs then there is neghgible 
contribution as A'^ — > oo. The proof foUows by counting how many tuples there 
can be with a given matching, and then comparing that to Af'^/^+i (the proofs are 
somewhat easier if our distribution is even). For example, as we are assuming our 
distribution has zero mean, if ever there was an ai,i,^-^^ that was unmatched (so 
neither (z£,Z£-(-i) or (ii^i,ie) occurs as the index in any other factor in the trace 
expansion, then the expectation of this term must vanish as each is drawn from a 
mean zero distribution. The number of valid pairings of the 's (where everything 
is matched in pairs) is {2k— 1)!! = {2k — l)(2fc — 3) • • • , which is the 2k*'^ moment of 
the standard normal; not ever matching contributes fully, though, and this is why 
the resulting moments are significantly smaller than the Gaussian's. 

*^The Catalan numbers are Cn = ( ^) ■ They arise in a variety of combina- 
torial problems; see for example |Stan| . 

''^If the ensemble of matrices had a different symmetry, the start of the proof 
proceeds as above but the combinatorics changes. For example, looking at Real 
Symmetric Toeplitz matrices (matrices constant along diagonals) leads to very dif- 
ferent combinatorics (and in fact a different density of states than the semi-circle); 
see [HM| IMMSj . If we looked at d-regular graphs, the combinatorics differs again, 
this time involving local trees; see [McK] . 
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Figure 8. Spacings between normalized eigenvalues of 
5000 uniform matrices on [—1, 1] (size is 300). 

papers for details. As the best result is constantly being improved, the 
interested reader should check the arxiv, http://arxiv.org/, for the 
current status. 

Below we give some numerics from investigations in the bulk of the 
spectrum (i.e., normalized eigenvalues near 0). Our first example is 
when p is the uniform distribution on [—1, 1] (Figure [HI). Already for 
300 X 300 matrices we see excellent agreement with the conjecture. 

What about other distributions, for example the Cauchy density 
(7r(l + a;^))~^? We saw earlier that the density of states was decidedly 
non-semi-circular. It is a different story for the spacings (Figure [9]). 
Already for 300 x 300 matrices we see excellent agreement with the 
conjecture. 

There are numerous other ensembles where the density of states is 
non-semi-circular but the spacing between adjacent eigenvalues seems 
to agree with the conjecture. For example, McKay |McKj proved the 
density of states of ci-regular graph^ is Kesten's measure (which does 
converge to the semi-circle as d ^ oo), and simulations by many (in- 
cluding |JMRRj ) see the conjectured behavior. This is also apparent in 
more advanced tests, where the distribution of the largest eigenvalue 
is observed to follow a /? = 1 Tracy- Widom distribution (see [MNSj ). 
These distributions govern the largest eigenvalues in many settings; 
see |TWlt ITW2t [TW3j . If the ensemble has a very different structure. 



^'''A graph is c?-regular if each vertex is connected to exactly d neighbors. 
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Figure 9. Spacings between normalized eigenvalues of 
5000 Cauchy matrices; the left are 100 x 100 and the 
right are 300 x 300. 



however (such as the Toeplitz ensembles in |BDJ[ IHM[ IMMSj ) then 
both statistics could behave differently. 



5. From Random Matrix Theory to Number Theory 

We now discuss the path from random matrix theory to number the- 
ory. As this story has been told numerous times, we concentrate on 
some illuminating aspects and refer the reader to the references previ- 
ously mentioned. We concentrate on a very small subset of statistics 
and connections; for example, we will almost completely ignore the 
contributions to studying moments of the zeta function. Our goal is 
to explain how the behavior of some key statistics in number theory 
are the same as the corresponding statistics in random matrix theory. 
We concentrate on ({s) and its simplest generalization, Dirichlet L- 
functions, though these results hold for a larger class of L-functions 
as well (exactly how large a class is the subject of many research pro- 
grams) . 

The starting point of our analysis is Riemann's Explicit Formula, 
which is a natural generalization of (12.51) . Let 0(s) be a 'nice' function. 
We have 

— / -^M<l>i^)ds = — I Ym^ds. (5.1) 
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Shifting contours, by the residue theorem the left hand size is basically 
0(1) — J2p^ip)y while the right hand side is (for s = a + it): 



« oo 
n=l 



ds = r (P(a + it)e-'''°^''dt.(5.2) 

fr^ J-oo 



n 

n=l 



The integral is basically the Fourier transform^ (up to some constants) 
of f(t) = (j){a + it). A careful analysis |MT-B] gives the following 
explicit formula relating sums of a test function over zeros of C,{s) to 
sums of the Fourier transform over primes. 

Theorem 5.1 (Explicit Formula). Let denote the sum over the 
non-trivial zeros ofC{s) (i.e., the zeros in critical strip), g an even 
Schwartz function^ of compact support and (f){r) = j'^^g{u)e^^^du. 
Write p as 1/2 + i'^p; if the Riemann Hypothesis is true then 7p is real. 
We have 

= 2^(^)-Ei:^«(*'°grt 

p ^ ^ p k=l ^ 

1 r ( 1 r'(f + 1) 1 , \ , , , 

note up to scale that g and are essentially a Fourier transform pair. 

5.1. Preliminaries. We now come to the key moment of our story, 
when Montgomery and Dyson [Monj noticed the agreement between the 
pair correlation of zeros of ({s) and eigenvalues of complex Hermitian 
matrices. The pair correlation statistic of a set {xi,X2, . . . } is 

lim #{'7^J<^^^.-^.£/} (5.3) 

where / is an arbitrary interval. We can generalize this to triple corre- 
lation (which would be how often pairs of differences are in a box) and 
higher; knowing all the correlations is equivalent to knowing the spac- 
ing between adjacent elements. Instead of using a box or hypercube 
we can use a smooth test function^ Odlyzko |Odlt I0d2j observed 
phenomenal agreement between adjacent zeros and the corresponding 
distribution for spacings between adjacent eigenvalues of complex Her- 
mitian matrices; see Figure [TUl 



^^The Fourier transform of g is g(f ) — g{x)e "^^^^^dx. 

"^^This means for any m,n > that lim|2;|^oo(l + x^)"^ g^'^\x) — (i.e., g and 
all its derivatives tend to zero faster than any polynomial). 

5°We want (1) f[xi, . . . , a;„) is symmetric; (2) f{x + t{l, ...,!)) = f{x) for t e M; 
(3) f{x) rapidly as \x\ — > oo in the hyperplane Xj = 0; see |RSj . 
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1.0 




Figure 10. 70 million spacings between adjacent zeros 
of C{s), starting at the 10^''*'^ zero, versus the correspond- 
ing results for eigenvalues of complex Hermitian matrices 
(from Odlyzko). 



Hejhal |Hej] proved (for suitable test functions) that the triple cor- 
relation of ({s) agrees with random matrix theory, while Rudnick and 
Sarnak [RSJ showed agreement with the n-level correlations of any L- 
function attached to a cuspidal automorphic representation of GLm/Q. 
To describe these L-functions in detail would be too much of a di- 
gression, so we will content ourselves with a very brief introduction, 
referring the interested reader to |RSj for details. The Riemann zeta 
function 

n=l p prime ^ ' 

has an Euler product, a functional equation, and is conjectured to have 
all of its zeros in the critical strip < 3?(s) < 1 on the line 3?(s) = 1/2. 
The generalization is a series such as J2'^=i^n/n'^ , where the a„ are 
of arithmetic interest. In order to call this series an L-function, we 
require it to have certain properties (such as an Euler product and a 
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functional equation, as well as some growth rates on the a^'s). We call 
these L-functions, and they arise throughout number theory^ 

In random matrix theory, in order to understand the behavior of the 
eigenvalues of one matrix A we embedded it in a family of random 
matrices, and showed that with high probability the behavior of the 
eigenvalues of A are close to the ensemble average (at least as — oo). 
For the n-level correlations, we do not need to perform any averaging 
over L-functions; we may study an individual L-function. The reason 
is that the density of zeros in the critical strip whose imaginary part 
is of size T is (up to some constants) of size 1/logT; in other words, 
the higher up we go, the more densely the zeros of an L-function are 
packed together, and thus one L-function provides enough zeros high 
up to average. 

The results mentioned above suggested that, for the purposes of 
number theory, it sufficed to know how random complex Hermitian 
matrices behave, as the zeros of ({s) (and other L-functions) high up on 
the critical line showed remarkable agreement with these eigenvalues. 
This turned out, however, to only be part of the story. The reason is 
that the n-level correlations are insensitive to finitely many zeros. In 
other words, if we were to remove the 1701 zeros nearest to the critical 
point s = 1/2, the n-level correlations would not changej3 This is 
a major problem for number theory, as often we expect there to be 
behavior at the central point of arithmetic interest 

Katz and Sarnak |KaSall IKaSa2j showed that, as the size of the 
matrices tends to infinity, the n-level correlations of complex Hermitian 
matrices also equals those of N x N unitary matrices, as well as its 
orthogonal and symplectic subgroups^ Thus when we say that the 



The earliest occurrence was probably in Diriclilet's work, who used L-functions 
attached to characters on (Z/mZ)* to study primes in arithmetic progressions. 
Another example are elliptic curve L-functions, which (at least conjecturally) give 
information about the group of Mordell-Weil group of rational solutions of the 
elliptic curve. 

^^This is because the zeros are tending to infinity. Thus, given any zero and any 
finite box, only finitely many zeros can be associated to it such that the required 
differences lie in the box. Therefore, this zero has negligible contribution in the 
limit as we are dividing by N. 

^^For example, the Birch and Swinnerton-Dyer conjecture states the order of 
vanishing at the central point of an elliptic curve L-function equals the rank of the 
Mordell-Weil group of rational solutions of the elliptic curve; this is quite important 
information which we do not wish to discard! 

^^These classical compact groups are much more natural random matrix ensem- 
bles. In our original formulation, we chose the matrix elements randomly and inde- 
pendently from some probability distribution p. What should we take for p7 The 
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zeros behave like eigenvalues of complex Hermitian matrices, we could 
have also said they behave like eigenvalues of unitary matrices (or one 
of their subgroups) We thus need a new statistic to study which will 
'break' this symmetry, and say which ensemble is truly modeling the 
behavior. Further, our statistic should take into account the behavior 
near the central point, as interesting arithmetic occurs there. One 
popular choice is the 1-level density, which we now describe. 



5.2. 1-level Density (Preliminaries). Let 4>{x) be an even Schwartz 
function. This means for any m,n > that lim|^|^oo(l + x^)™0''"^(x) 
(i.e., and all its derivatives tend to zero faster than any polynomial). 
We also assume the Fourier Transform|3 is compactly supported; this 
means there is some a < oo such that (f){^) = if |^| > a. 
Consider an L-function 

MsJ) = (5.5) 

n=l 

we assume this series converges for 3?(s) > 1, has a meromorphic ex- 
tension to all of C satisfying a functional equation, and has an Euler 
productlfl We have remarked that high up, the spacing between zeros 
is like 1/logT at height T; what is it near s = 1/2? The answer can 
be deduced from an analysis of the functional equation, which shows 
there is some number Cf (called the analytic conductor) such that the 
zeros near the central point are spaced on the order of 1/ logC/. This 
suggests we study the following statistic: 



GOE and GUE ensembles, where the entries are chosen from Gaussians, arise by 
imposing invariance on Proh{A)dA under orthogonal (respectively unitary) trans- 
formations; these are natural conditions to impose as the probability of a transfor- 
mation should be independent of the coordinate system used to write it down. The 
classical compact groups come endowed with a natural probability, namely Haar 
measure. 

^^The eigenvalues of a unitary matrix are of the form e'^. To see this, let v be an 
eigenvector of U with eigenvalue A. Note v*U*Uv — v*v, which gives |Ap||w|p = 
so |A| = 1. Thus, similar to real symmetric and complex Hermitian matrices, 
we can parametrize the eigenvalues by a real quantity. 

^^We use the normalization 0(^) = J^^4>{x)e~'^'"^'^^dx. The Fourier transform 
has many nice properties on the Schwartz space (see |SS| for example). 

^^These are very strong conditions, and most choices of A/(n) will not satisfy 
these requirements. Fortunately there are many choices that do work, and these 
frequently encode information about arithmetically interesting problems. The most 
studied examples include Dirichlct L- functions, modular and Maass forms; see [IK] 
for details. 
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Definition 5.2 (The 1-level density). Let be an even Schwartz func- 
tion and L[s,f) an L-function as above. The 1-level density is 



We may generalize the above to n-level densities; while for some ap- 
plications it is essential to understand these generalization^, for many 
purposes studying the 1-level density suffices. This statistic differs in 
several important ways from the n-level correlation. 

The first is that individual zeros now contribute in the limit. More- 
over, most of the contribution is from the zeros near the central point 
(thus this statistic is sensitive to what is happening there). This is 
because is of rapid decay, so once we are a couple of average spacings 
away, there is negligible contribution. There is a trade-off, namely it 
no longer suffices to study just one L-function. The reason is we always 
need something to average over, and there are just too few zeros near 
the central point on this scale. The solution is to look at the zeros near 
the central point for many L-functions that share common properties. 
We average the 1-level densities over the family. Unlike the n-level cor- 
relations, where we are looking high up on the critical line and see the 
same behavior in all L-functions, we see very different behavior near 
the central point, depending on what family of L-functions we study. 

Katz and Sarnak [KaSall [KaSa2] conjecture that to any 'nice' family 
of L-functions, as the conductors tend to infinity the 1-level density of 
the family agrees with the N ^ oo scaling limit of a classical compact 
group (typically N x N unitary, orthogonal or symplectic matrices). 
Moreover, these groups all have distinguishable behavior. In other 
words, the universality seen in the n-level correlations is broken. 

Before describing the proof, we give some examples of families of 
L-functions and the corresponding symmetries. 

(1) Dirichlet L-functions: Let m be a prime and consider all non- 
principa@ Dirichlet characters x from (Z/mZ)* to the complex 
numbers of absolute value 1. To each character x we have an 
L-function L{s,x) = T.nXin)/n' = 11^(1 - xip)?'')'^- As 
g — > oo, the behavior agrees with the scaling limit of unitary 

^^For example, there are three flavors of orthogonal symmetry, and their 1-level 
densities are indistinguishable if the support of cj) is contained in (—1, 1); however, 
the 2-level densities of all three are distinguishable for arbitrarily small support 
[Mill| . Another example is in obtaining better decay rates for high vanishing at 
the central point in a family [HM] . 

^^This means we avoid the trivial character xo i^) = 1 if tt. is relatively prime to 
m and otherwise; this character gives rise to a simple modification of 




J 



(5.6) 
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(2) Cuspidal newforms: Let 




f is a weight k holomorphic cuspform oi 



(5.7) 



(5.8) 



where •jz = {az + b)/{cz + d). As k or N tend to infinity, the 
behavior agrees with the scahng hmit of orthogonal matrices!^ 

(3) Elliptic curves: Let S : y"^ = + A{T)x + B{T) be an elliptic 
curve over Q(T). For each t G Z we can specialize and get an 
elliptic curve Ef : y"^ = + A(t)x + B{t). We can build an 
L-function, where ap{t) is related to the number of solutions to 
y'^ = x^ + A(t)x + B{t) mod p. Our family is now t G [X, 2X] 
with X oo, and these families have orthogonal symmetry. 

In these and many o ther cases (see [DMll [Fll [Gaol iGul iHMl iHRl 
ULSl IKaSa2[ iMiITl [Ml5l iRol iRubl IYo2] for a representative sampling of 
results), we can show for suitably restricted (j) that the 1-level density 
agrees with the scaling limit of one of the mentioned classical compact 
groups. 

5.3. 1-level Density (Proofs). We briefly describe how the proofs 
proceed. We concentrate on the family of Dirichlet characters with 
prime conductor q tending to infinity. As in the proof of Wigner's 
semi-circle law, there are three steps. 

o We first must determine the correct scale to study the zeros near 
the central point. The answer can be shown to follow from the func- 
tional equation; in this case, we normalize the zeros by the factor 



We could consider the related family of quadratic Dirichlet characters coming 
from a fundamental discriminant d G [X, 2X] with X — )■ oo; this family agrees with 
the scaling limit of symplectic matrices. 

^^There are actually three flavors of orthogonal groups. If all the signs of the 
functional equation are even, the corresponding group should be S0(2iV), and 
similarly if the signs are all odd. We refer the reader to the previously mentioned 
surveys for the details. 



(log(m/7r))/(27r). 
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o The next step is to relate what we want to study, namely the zeros 
near the central point, to something we have a chance of understand- 
ing, in this case the coefficients of the L-function, xl'^)- We do this 
with a generalization of Riemann's explicit formula (this is the ana- 
logue of the Eigenvalue Trace Lemma). Our result is a straightforward 
generalization of the formula Riemann used to connect the zeros of 
({s) with the distribution of primes. The difference here is that we 
have a more general test function. It can be shown (see \ILS\ IRSj ) that 
for (j) an even Schwartz function and L{s, x) = ^„ xi''^)/'^^ ^ Dirichlet 
L-function from a non-trivial character x with conductor m and zeros 
p= 1 + «7x,P' then 



log(m/7r) / ,/ N J 



p 



27r 

logp logp A xip) 



log(m/7r) \log(m/7r) / p^/^ 
logp 2/0 logP ^ X^{p) 



log(m/7r) \ log(m/7r) / p \logm 



+ r— • (5-9) 



Note the left hand side is a sum over zeros and the right hand side 
a sum over the coefficients in the L-function. We also have on the 
right hand side. We now see how the support condition enters. As we 
assume (p is supported in {—a, a), this restricts the sums on the right 
to having only finitely many terms. This is the 1-level density for one 
Dirichlet L-function L{s,x)- We now average over all non-principal x 
(i.e., X Xo)- We will see below that there are m — 2 such characters, 
and thus we obtain 

X^XO P ^ / J-00 

iog(77i/7r) \log(m/7r)/ m — 2 ^ 

P X7~X0 

-2 E Vt^? ( 2r^) E xHp) + o 

» n OCT m/TT \ ncr m/Tr / m — / ^ 



plog(m/7r) \ log(m/7r)/ m — 2 ^ ylogm 



(5.10) 



o Similar to the proof of Wigner's semi-circle law, our explicit for- 
mula would be useless unless we can perform the averaging over the 
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family. We briefly review some needed results about these characters 
(see for instance |MT-Bj ). and then show how to handle the sums. Un- 
fortunately, the averaging formulas in number theory are significantly 
worse than the corresponding averaging formulas in random matrix 
theory; this leads to far more restricted results in number theory^ 

For m prime, the group (Z/mZ)* is cyclic of order m — ij^ denote 
its generator by g. Let Cm-i = e^'^*/'-™"^-'. The principal character xo 
is given by 



As the characters are group homomorphisms from (Z/mZ)* to com- 
plex numbers of absolute value 1, the m — 2 primitive characters are 
determined by their action on (70 Thus there exists an i such that 
x{g) = Cm-i- ^ simple calculation (use the explicit representation for 
the characters and the geometric series formula) shows that 



where we are summing over all characters, including the principal one. 
It is easy to remove the contribution from the principal character, and 
we find for any prime p ^ m 



In fact, the case being studied here has the best averaging formula! For cuspidal 
newforms we have the Petersson formula, and for families of elliptic curves we 
can use periodicity in evaluating Legendre sums of cubic polynomials. In general, 
however, unless our family is obtained in some manner from a family such as one 
of these, we do not possess the needed averaging formula. 

^^(Z/m7j)* is the set {1,2,. ..,m — 1} where multiplication and addition are 
modulo m; for example, if to = 17, a; = 11 and y = 9 then xy — 99 = 5 ■ 17 + 14, so 
xy = 14 mod 17, while x + y ^ 20 = 3 mod 17. The hardest step in proving that 
this is a group under multiplication is finding inverses; one way to accomplish this 
is with the Euclidean algorithm. 

^"^These properties mean x(l) = 1, x(a;y) — x{^)x{y) and x{^^) — xi^Y ■ Fur- 
ther, though initially only defined on {"L/rnL)* , we extend the definition to all of Z 
by setting x(n + Im) = x(n). As 5™"""^ = 1 mod to, x{9"^~^) = 1 or xCff)™"^ — 1 
for all X- Thus xid) has to be an to — 1 root of unity. We see each of these roots 
gives rise to a character (one of which is the principal or trivial character); by 
multiplicativity once we know the character's action on the generator we know it 
everywhere. 




(5.11) 




(5.12) 




(5.13) 



44 FRANK W. K. FIRK AND STEVEN J. MILLER 

This is the desired averaging formula. We substitute it into 

S log(m/7r) ^ (log(m/7r)) ^^T^ ^ ^^'^^^ 

We write /(x) <^ g[x) if there is a C such that for all x sufficiently 
large, |/(a;)| < Cg{x). Our function is bounded, and as p < m°", 
logp -C crlog(m/7r) as m — >^ oo. A simple calculation shows there is no 
contribution if cr < 2: 

—2 logp ^/ logp \ _i 

' ■ P 2 



m — 2 log(m/7r) \log(m/7r) 



p 



2 



+ o "^ ~ ^ logp logp \ 

m — 2 ^ logfrn/vr) Viogfm/vr)/ 

p=l{m) ' ^ \ &\ / 

m — ^ — ^ m ^ — ^ — ^ 

p p=l(m) fc fc=l(m) 

h>m + \ 

k k 

It is conjectured that there should be no contribution for any finite o 
extending this further is related to some of the deepest questions about 
how primes are distributed in arithmetic progressions. 

5.4. Nuclear Physics Interpretation. It is interesting to interpret 
our results on the 1-level density in the language of nuclear physics: 



Zeros of L-functions < — > Energy levels of heavy nuclei 
Schwartz test function < — > Neutron 
Support of test function < — > Neutron Energy. 



''^This is not surprising, as the above argument is quite crude, where we have 
inserted absolute values and thus lost the expected cancelation due to the terms 
having opposite sign. 
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We expand on the above. Assuming the (Generahzed) Riemann Hy- 
pothesio, the zeros of our L-functions he on the critical hne 3fJ(s) = 
1/2. Thus we may order them, and it makes sense to talk about spac- 
ings between adjacent zeros. Following the Polya-Hilbert dream (and 
there are numerous people pursuing this), we may even try to search 
for a physical system whose energy levels are these zeros! Regardless, 
we have two sequences of real numbers: the zeros of our L-function(s) 
and the energy levels of our heavy nucleus (nuclei). 

How do we understand the structure of the nucleus? We bombard it 
with low energy neutrons, and see what happens. The analogy on the 
number theory side is we 'bombard' the zeros of our L-function with 
our Schwartz test function; what we 'see' now is a sum over primes. 

In physics, ideally we would like to be able to send in a neutron 
with any energy; unfortunately, current technology only allows us to 
send in neutrons with energy in a given band. Thus we cannot obtain 
perfect information about the internal structure of the nucleus and its 
energy levels. This corresponds exactly with number theory, where 
the support of the Fourier transform of our Schwartz test function is 
playing the role of the neutron's energy. Bombarding the zeros with a 
test function is equivalent to summing the Fourier transform against 
related quantities, and our averaging formulas are only able to handle 
certain restricted sums. 

It is worth dwelling on this last observation a little more. The Heisen- 
berg Uncertainty Principle can be recast in mathematical terms as a 
statement about a function and its Fourier transform, namely it is not 
possible to simultaneously localize / and / (i.e., the product of the 
variances, their spreads about their means, cannot be too small). Ide- 
ally we would like to take 6{x — a) as our test function, as this would 
allow us to understand whether or not there are zeros at a. In partic- 
ular, if we take a = we would understand the behavior at the critical 
point. Unfortunately the Fourier transform of S{x) is identically 1; this 
corresponds to having absolutely no control on the prime sum side. 

5.5. Future avenues. Random matrix theory has enjoyed remarkable 
success in suggesting questions and predicting answers for number the- 
ory. The n-level correlations and densities are two of many examples. 
One problem, however, is that random matrix theory often cannot de- 
tect the arithmetic of the L-functions, and this must be added (in a 
sometimes unsatisfying manner). A terrific example of this is in the 
study of moments of L-functions (see |CGot ICGh[ ICFKRSt IKeSnll 

''^This just asserts that any 'nice' L-fmiction has aU of its zeros in the critical 
strip having real part equal to 1/2. 



46 



FRANK W. K. FIRK AND STEVEN J. MILLER 



IKeSn2j and the references therein); see also the discussion below on 
the hybrid product formulas of Gonek, Hughes and Keating. 

For families of L-functions {L{s, (i G {1, 2, . . . , J}), we can 

form the Rankin- Selberg convolution and study the family 

see |IK] for details^ Given a family {L(s, /j)}/-gjc-., the Katz-Sarnak 
conjecture states the behavior of zeros near the central point (as the 
conductors tend to infinity) agrees with the N ^ oo scaling limit of 
a subgroup of unitary matrices U{N). A natural question to ask is 
how the behavior of zeros in the family of convolutions is related to the 
behavior of the constituent families. Dueiiez-Miller show that if the 
family of L-functions are 'nice' (see |DM2j for statements and proof^^, 
then we can attach a symmetry constant c(JFj) to each family satisfying 
the following conditions: 

(1) c(jFj) is if the family has unitary symmetry, 1 if the family 
has symplectic symmetry and —1 if the family has orthogonal 
symmetry; 

(2) c{J^i X JF,) = c(J'i) X c{J='j). 

In other words, for many families the symmetry type of the convolution 
is the product of the symmetry typesoThis leads to a very nice map 
from families of L-functions to {0, ±1}lj 

Another problem is that the main term in the 1-level density agrees 
with random matrix theory, but the arithmetic of the family does not 



^hi L{s, /O = L^,,{s) Up UT=i (1 - o^rAp)P~T\ then L{s, h /2) is related 
to a product over primes of Jljl^ Ilfc^i (1 ~ oij-i[p)ak-2{p)p~'^)~^ ■ It is conjectured 
that these functions have functional equations, satisfy the Riemann hypothesis, et 
cetera. The existence of the Rankin-Selberg convolution is known for just a few 
choices of the /i's. 

^^There are several difficulties with the proofs in general, ranging from not know- 
ing properties of the Rankin-Selberg convolution in general to not having a good 
averaging formula over general families. 

''^A special case of this theorem was discovered by Duehez-Miller in [DM1] in 
studies of a family of GL(4) and a family of GL(6) L-functions. The analysis there 
led to a disproof of a folklore conjecture that the theory of low-lying zeros of L- 
functions is equivalent to a theory of the distribution of signs of the functional 
equations in the family; see [DM1[ IDM2j for details. The key ingredient in the 
proofs is the universality of the second moments of the Satake parameters aj-i{p); 
this is similar to the universality found by Rudnick and Sarnak [RS' in the n-level 
correlations. The higher moments of the Satake parameters control the rate of 
convergence to the random matrix predictions. 

'^''instead of attaching a symmetry constant, additionally one can attach a sym- 
metry vector which incorporates other information, such as the rank of the family. 
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surface until we examine the lower order terms (which control the rate 
of convergence; see for example [Fit IMil5[ lYolj ). One promising line 
of research is the L-functions Ratios Conjecture |CFZlt [CFZ2j . which 
is supported by corresponding calculations for random matrix ensem- 
bles (see [CSl IGJMMNPPl iMiIil IMM IMilMol ISto] for some recent 
work supporting these conjectures, especially |CS] for a very accessible 
introduction to the method and a summary of its successes). An- 
other approach is through hybrid product formulas |GHKj . A typical 
L-function has two product representations, one as an Euler product 
over primes, and one as a Hadamard product over its zeros. In this ap- 
proach an L-function is modeled by the product of a partial Euler and a 
partial Hadamard product. The Hadamard piece is believed to be well- 
modeled by random matrix theory, while the Euler product introduces 
the arithmetic. Thus the interplay between random matrix theory and 
number theory continues, and what began as a chance meeting in the 
1970's now yields over 1,000,000 hits on a google searchlHI (as of August 
2009). 
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